{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multinode Inference on Polaris"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many flavors of modern LLMs are prohibitively large to serve on your local hardware. To that end, this tutorial will demonstrate how you can run inference on Llama 3.3 70B using the Polaris cluster. As a reminder, Polaris is composed of hundreds of nodes, each composed of 4 x A100-40GB GPUs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prerequisites"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Oumi Installation\n",
    "First, let's install Oumi. You can find detailed instructions [here](https://github.com/oumi-ai/oumi/blob/main/README.md), but it should be as simple as:\n",
    "\n",
    "```bash\n",
    "pip install oumi\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating our Working Directory\n",
    "For this tutorial, we'll use the following folder to save our generated dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "tutorial_dir = \"polaris_inference_tutorial\"\n",
    "\n",
    "Path(tutorial_dir).mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prepare Your Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our inference pipeline currently expects inputs using OpenAI's chat format. Let's download a dataset from HuggingFace and massage it into the proper format. We'll use a small subset of the `cais/mmlu` as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datasets\n",
    "\n",
    "# Optional system context we'll use when creating our dataset\n",
    "system_context = \"You are a helpful AI assistant.\"\n",
    "\n",
    "# This dataset has only 100 examples.\n",
    "dset = datasets.load_dataset(\"cais/mmlu\", \"abstract_algebra\", split=\"test\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's massage the data and save it as a JSONL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pprint import pprint\n",
    "\n",
    "data_location = Path(tutorial_dir) / \"data.jsonl\"\n",
    "with open(str(data_location), \"w\") as f:\n",
    "    for data in dset:\n",
    "        system_message = {\"role\": \"system\", \"content\": system_context}\n",
    "        user_content = \"\\n\".join([data[\"question\"], \"Choices: \", *data[\"choices\"]])\n",
    "        user_prompt = {\"role\": \"user\", \"content\": user_content}\n",
    "        entry = {\"messages\": [system_message, user_prompt]}\n",
    "        print(json.dumps(entry), file=f)\n",
    "\n",
    "print(\"Sample entry:\")\n",
    "pprint(entry)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setting up our Job"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have a predefined job for running inference on Llama3.3-70B-Instruct. You can find it at `configs/examples/misc/vllm_polaris_job.yaml`.\n",
    "\n",
    "This job accepts an input JSONL file of conversations and an output directory for writing our results. Note: your output dir must be on one of Polaris' file systems (we recommend `/eagle/community_ai/$USER`).\n",
    "\n",
    "First, let's load the config:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import oumi.launcher as launcher\n",
    "\n",
    "job_name = \"YOUR_JOB_NAME\"\n",
    "polaris_user = \"YOUR_POLARIS_USERNAME\"\n",
    "\n",
    "# We assume you're running this notebook in the /notebooks directory.\n",
    "# Move up one directory to run the job from the root of the repository.\n",
    "os.chdir(Path(tutorial_dir).absolute().parent.parent)\n",
    "job_path = Path() / \"configs\" / \"examples\" / \"misc\" / \"vllm_polaris_job.yaml\"\n",
    "\n",
    "job = launcher.JobConfig.from_yaml(str(job_path))\n",
    "job.name = job_name\n",
    "job.user = polaris_user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's specify the inputs for the job, such as the model to run inference on and the input path:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your input path should be a relative path from the working directory.\n",
    "input_filepath = str(Path(\"notebooks\") / data_location)\n",
    "\n",
    "# Write the output to Polaris in a directory named after the job and user.\n",
    "output_dir = str(Path(\"/eagle\") / \"community_ai\" / polaris_user / job_name)\n",
    "\n",
    "# Set the input and output paths in the job environment.\n",
    "job.envs[\"REPO\"] = \"meta-llama\"\n",
    "job.envs[\"MODEL\"] = \"Llama-3.3-70B-Instruct\"\n",
    "job.envs[\"OUMI_VLLM_INPUT_FILEPATH\"] = input_filepath\n",
    "job.envs[\"OUMI_VLLM_OUTPUT_DIR\"] = output_dir\n",
    "job.envs[\"OUMI_VLLM_NUM_WORKERS\"] = str(10)  # Samples will be divided amongst workers\n",
    "job.envs[\"OUMI_VLLM_WORKERS_SPAWNED_PER_SECOND\"] = str(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: You can run 70B inference on a single node in Polaris using the 70B.w8a8 (Int8 quantized) version from neuralmagic."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Running Inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With our job set up, we can kick off inference on Polaris!\n",
    "\n",
    "**IMPORTANT** Note that you'll be required to input your Polaris credentials twice. Make sure you refresh your credentials between each input or copying your files will fail."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The cluster for Polaris jobs must be of the form `queue_name.user_name`.\n",
    "# The following will use the `debug` queue.\n",
    "cluster, job_status = launcher.up(job, f\"debug.{polaris_user}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we just need to wait for our job to finish. We can check our job's status using the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "while not job_status.done:\n",
    "    job_status = cluster.get_job(job_status.id)\n",
    "    print(f\"Job status: {job_status}\")\n",
    "    time.sleep(30)\n",
    "\n",
    "print(f\"Job finished with status: {job_status.status}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When your job is done, you can find the job outputs by SSHing into Polaris and navigating to your output directory: `/eagle/community_ai/YOUR_USER_NAME/YOUR_JOB_NAME`. You should see two files corresponding to the inference output and metrics, beginning with the job id being printed in the above cell.\n",
    "\n",
    "Run the cell below to find the exact location for your jobs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Run the following on Polaris to find the job output files:\")\n",
    "print(f\"ls {output_dir}/{job_status.id}_vllm_*.jsonl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of inference will be a JSONL in the standard OpenAI chat format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced Setup\n",
    "\n",
    "Depending on the size of your input, your job may require more time to run. Polaris requires jobs to set a max run time when queued. To adjust this, navigate to our job config and adjust the line containing the `#PBS -l walltime` directive to the time required for your run.\n",
    "\n",
    "For example, the following will configure the job to terminate after 10 minutes of run time:\n",
    "`#PBS -l walltime=00:10:00` \n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "oumi",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
